46 research outputs found
Generative and Discriminative Text Classification with Recurrent Neural Networks
We empirically characterize the performance of discriminative and generative
LSTM models for text classification. We find that although RNN-based generative
models are more powerful than their bag-of-words ancestors (e.g., they account
for conditional dependencies across words in a document), they have higher
asymptotic error rates than discriminatively trained RNN models. However we
also find that generative models approach their asymptotic error rate more
rapidly than their discriminative counterparts---the same pattern that Ng &
Jordan (2001) proved holds for linear classification models that make more
naive conditional independence assumptions. Building on this finding, we
hypothesize that RNN-based generative classification models will be more robust
to shifts in the data distribution. This hypothesis is confirmed in a series of
experiments in zero-shot and continual learning settings that show that
generative models substantially outperform discriminative models
Learning Word Representations with Hierarchical Sparse Coding
We propose a new method for learning word representations using hierarchical
regularization in sparse coding inspired by the linguistic study of word
meanings. We show an efficient learning algorithm based on stochastic proximal
methods that is significantly faster than previous approaches, making it
possible to perform hierarchical sparse coding on a corpus of billions of word
tokens. Experiments on various benchmark tasks---word similarity ranking,
analogies, sentence completion, and sentiment analysis---demonstrate that the
method outperforms or is competitive with state-of-the-art methods. Our word
representations are available at
\url{http://www.ark.cs.cmu.edu/dyogatam/wordvecs/}
The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining
We analyze the masked language modeling pretraining objective function from
the perspective of the distributional hypothesis. We investigate whether better
sample efficiency and the better generalization capability of models pretrained
with masked language modeling can be attributed to the semantic similarity
encoded in the pretraining data's distributional property. Via a synthetic
dataset, our analysis suggests that distributional property indeed leads to the
better sample efficiency of pretrained masked language models, but does not
fully explain the generalization capability. We also conduct analyses over two
real-world datasets and demonstrate that the distributional property does not
explain the generalization ability of pretrained natural language models
either. Our results illustrate our limited understanding of model pretraining
and provide future research directions.Comment: EMNLP 202
Understanding In-Context Learning with a Pelican Soup Framework
Many existing theoretical analyses of in-context learning for natural
language processing are based on latent variable models that leaves gaps
between theory and practice. We aim to close these gaps by proposing a
theoretical framework, the Pelican Soup Framework. In this framework, we
introduce (1) the notion of a common sense knowledge base, (2) a general
formalism for natural language classification tasks, and the notion of (3)
meaning association. Under this framework, we can establish a
loss bound for in-context learning, where is the number
of example-label pairs in the demonstration. Compared with previous works, our
bound reflects the effect of the choice of verbalizers and the effect of
instruction tuning. An additional notion of \textit{atom concepts} makes our
framework possible to explain the generalization to tasks unseen in the
language model training data. Finally, we propose a toy setup, Calcutec, and a
digit addition task that mimics types of distribution shifts a model needs to
overcome to perform in-context learning. We also experiment with GPT2-Large on
real-world NLP tasks. Our empirical results demonstrate the efficacy of our
framework to explain in-context learning